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Abstract. This study builds upon the theoretical foundations of a special 
class of complex networks called the Alphabetic Bipartite Networks (cv-BiNs) 
that was introduced by us earlier in the article Europhysics Letters 79, 28001 
(2007). This special class of networks is appropriate for modeling discrete 
combinatorial systems (DCS) where the basic building blocks arc a finite set 
of elementary units such as codons in a DNA sequence and words in a lan- 
guage, while different discrete combinations of these units can give rise to a 
potentially infinite number of genes or sentences. In this paper, we study the 
network of the shared discrete combinations, which is the one-mode projection 
of the ct-BiN onto the elementary units alone. The topology of such a network 
is extremely crucial and can provide important insights into the structure of the 
underlying DCS. The general assumption in the literature for such an analysis 
is that the sizes of the discrete combinations are fixed to a constant. However, 
real-world DCSs present us with instances where this size varies and therefore 
in the current analysis we relax this general assumption and treat these sizes as 
random variables being sampled from a particular distribution. An important 
observation is that the size distribution actually affects the degree distribu- 
tion of the alphabet nodes in the one-mode projection although it does not 
affect the degree distribution of these nodes in the o-BiN itself. We derive ap- 
proximate analytical expressions for the degree distributions assuming various 
distributions from which these sizes are sampled. Our analytical expressions 
agree quite well with the stochastic simulations. In order to further corroborate 
our finding, we present four real- world cases two of which are from the domain 
of natural languages while the other two are from the domains of biology and 
society. The results obtained for each of these cases are in agreement with our 
finding. The mathematical framework that we develop is not only applicable 
for the analysis of the one-mode projection of the o-BiNs but also can be em- 
ployed to analyze the one-mode projection of any bipartite network in general 
for which the degree distribution of the two partitions are known. 
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1. Introduction. Alphabetic Bipartite Networks (a-BiNs) are a special class of 
complex bipartite networks which was introduced by us in [12] and are appropriate 
for modeling discrete combinatorial systems (DCS) [13]. A DCS consists of a finite 
set of elementary units (e.g., codons and letters/phonemes) that serves as its basic 
building blocks and the system, in turn, is a collection of a potentially infinite num- 
ber of discrete combinations of these units (e.g., genes and languages). A DCS can 
be easily represented as a bipartite network where one of the partitions corresponds 
to the elementary units and the other corresponds to the discrete combinations. 
There exists an edge between an elementary unit and a discrete combination iff the 
unit is a part of that discrete combination. The name a-BiN signifies the fact that 
the set of elementary units, in both human and genetic languages is referred to as 
an Alphabet. 

There have been many studies pertaining to bipartite networks where both the 
partitions grow with time [2, 7, 11, 14] and also some pertaining to non-growing 
bipartite networks [4, 10]. However, those like a-BiNs in which one of the partitions 
remain fixed over time while the other grows have received much less attention. 
We identified this special class of networks in [3, 12] and have proposed a growth 
model for them that is based on preferential attachment coupled with a tunable 
randomness component. We also presented exact analytical solution for the degree 
distribution of the alphabet nodes for certain sub-cases of the model at any instant 
of time. More specifically, we had dealt with the case of (a) sequential attachment 
where each node in the growing partition enters the system in a sequential manner 
with only one edge which gets preferentially attached to an alphabet node (e.g., 
an a-BiN consisting of speakers and the corresponding language they speak), (b) 
parallel attachment without replacement where a node in the growing partition enters 
with more than one edge and attaches itself with a set of distinct nodes in the 
fixed partition (e.g., an a-BiN of phonemes and phoneme inventories where an edge 
indicates that a particular phoneme is present in a particular inventory) and (c) 
parallel attachment with replacement where a node in the growing partition enters 
with more than one edge and is allowed to attach itself to a set of non-distinct nodes 
in the fixed partition (e.g., an a-BiN of codons and the genes formed from them 
where an edge denotes that a particular codon is a part of a gene) . 

From a bipartite network, such as the a-BiN, we can construct the network of 
shared discrete combinations, the so called one-mode projection onto the elemen- 
tary units alone. Such an one-mode projection precisely represents a "collaboration 
network" that is usually defined as a network of actors (analogous to the elemen- 
tary units) connected by a common collaboration act (analogous to being a part of 
the same discrete combination). The links in this network are representative of the 
intensity of collaboration between a pair of actors. In fact, there are a number of 
studies related to real-world bipartite networks and their corresponding one-mode 
projections [1, 11, 14]. For instance, it has been shown that for a movie-actor col- 
laboration network, the degree distribution of the actor nodes in both the bipartite 
network and the one-mode projection follow a power-law [11, 14]. Similarly, in case 
of scientific collaboration network, it has been observed that the degree distribu- 
tion of the author nodes shows a fat-tailed behavior in both the bipartite network 
as well as the one- mode projection [14]. In case of board-director networks it has 
been found that the degree distribution of the director nodes in both the bipartite 
and the one- mode network can be roughly fitted using exponential functions [1, 14]. 
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Various models have been proposed and analytically solved to explain these observa- 
tions [6, 11, 14]. However, as shown in [12], the degree distribution of the alphabet 
nodes in a-BiN asymptotically approaches a /3-distribution rather than the widely 
observed power-law distribution. 

In all the aforementioned studies on one-mode projection, a general trend has 
been to assume the sizes of the discrete combinations (i.e., the degree of the nodes 
in the growing partition) to be fixed to a constant. However, real-world systems 
(including DCSs) present us with examples where this size varies - neither all genes 
are composed of the same number of codons and nor all movies have the same cast 
size. Therefore, these sizes need to be viewed as random variables being sampled 
from a distribution rather than a constant - a fact that has not been taken into 
consideration in earlier analytical studies pertaining to bipartite networks. In this 
paper, our primary objective is to relax the general assumption outlined above and, 
thereby, present a detailed and in-depth analysis of the one-mode projection of 
a-BiN onto the alphabet nodes. 

In some of the previous studies, the main focus has been on the degree distribu- 
tion of the alphabet nodes of real-world a-BiNs like the phoneme-language network 
(phonemes/speech sounds are the basic units and the sound systems of languages 
are the discrete combinations) while the properties of the degree distribution of the 
one-mode projection of such a network remains largely unexplained. For instance, 
in [3] the degree distribution of the network of co-occurrence of phonemes (i.e., 
two phonemes are connected by an edge as many times as they co-occur across 
the sound systems of different languages) which is the one-mode projection of the 
phoneme-language network, differs significantly from the theoretical predictions. In 
this paper, we identify the gap in the analysis and suitably extend the theoretical 
framework to explain quite accurately the emergent degree distribution of the one- 
mode projection of a-BiNs. We relax the general assumption that the sizes of the 
discrete combinations are fixed and view them as random variables being sampled 
from a particular distribution. Our analytical results show that this distribution 
indeed affects the degree distribution of the one-mode projection even though it 
does not affect the degree distribution of a-BiN. In fact, these results are in good 
agreement with those obtained from the stochastic simulations. We further analyze 
four real- world a-BiNs, two of which are taken from the domain of natural language 
while the other two are from the domains of biology and society respectively. We 
observe that the empirical results obtained from the analysis of these a-BiNs agree 
with our finding. 

An equally important contribution of this work is that the theoretical framework 
developed can not only be applied to analyze the degree distribution of the one-mode 
projection of a-BiNs but also can be easily used for the analysis of the one-mode 
projection of any bipartite network in general for which the degree distribution of 
the two partitions are known. 

The rest of the paper is organized as follows. Section 2 presents a review of our 
work related to a-BiNs. In section 3, we analyze the degree distribution of the 
one-mode projection of a-BiN. In section 4, we corroborate our finding through the 
empirical analysis of four real-world a-BiNs. Finally, we conclude by enumerating 
our major contributions and pointing to some important implications of the current 
work. 



4 



ANIMESH MUKHERJEE, MONOJIT CHOUDHURY AND NILOY GANGULY 



2. A brief review of the model and the associated predictions. A bipartite 
graph G is a 3-tuple (U, V, E), where U and V are mutually exclusive sets of 
nodes (also known as the two partitions) and E C U X V is the set of edges that 
run between these partitions. Let us denote the elementary units of a DCS by the 
nodes in the partition U and let each unique discrete combination of the elementary 
units be denoted as a node in the partition V. There exists an edge between a basic 
unit u G U and a discrete combination v G V iff u is a part of v. 

The growth of this network has been described in [3, 12] through a simple model 
based on preferential attachment coupled with a tunable randomness component. 
Suppose that the partition U has N nodes labeled as «i to u^. At each time step, 
a new node is introduced in the set V which connects to fx nodes in U based on 
a predefined attachment rule. Let vt be the node added to V during the t th time 
step. Let A(kl) denote the probability that a new node vt entering V attaches itself 
to a node Ui G U, where k\ refers to the degree of the node U{ at time step t. A{k\) 
defines the attachment kernel and takes the form 

A(kl) = f l + 1 (1) 

where 7 is the tunable parameter that controls the amount of randomness in the 
system. The lower the value of 7 the higher is the randomness. Using techniques 
of linear algebra we can derive an approximate closed form solution for pk,t that 
approaches a /3-distribution asymptotically with time and can be expressed as 

Pk ,t = Aik/ty' 1 - 1 ^ - k/ty-^ 1 - 1 (2) 

where r\ = N/fij and A is a normalization constant. 

Formally, for an a-BiN (U, V, E) the one-mode projection onto the nodes U is 
a graph Gjj : (U, Eu), where m,Uj G U are connected (i.e., (m,Uj) G Ejj) if there 
exists a node v G V such that (ui, v) G E and (uj, v) G E. If there are w such nodes 
in V which are connected to both m and Uj in G, then there are w edges linking 
Ut and Uj in the one-mode projection Gjj- Alternatively, one can think of Gjj as a 
weighted graph, where the weight of the edge (uj, uj) is w. 

One can easily calculate the degree of the nodes in the one-mode projection Gu 
if each node introduced in V connects to exactly [i nodes in U . In other words, the 
size of each discrete combination in this case is assumed to be equal to a constant /z. 
Consider a node u G U that has degree k in the bipartite network. Therefore, u is 
connected to k nodes in V and each of these k nodes are in turn connected to ji — 1 
other nodes in U. Defining the degree of a node as the number of edges attached to 
it, in the one-mode projection, u has a degree of q = — 1). However, it is not 
realistic to assume that the size of each discrete combination i.e., the degree of each 
node in V is a constant \i. Real-world DCSs present us with instances where this 
size varies and therefore, it has to be thought of as a random variable that is being 
sampled from a distribution. Indeed, as we shall sec, this distribution affects the 
degree distribution of the alphabet nodes in the one-mode projection even though 
their degree distribution in the a-BiN remains unaffected. The reason for this is 
that the degree q of u in the one-mode projection is dependent on this size while 
the degree k in a-BiN is not as long as the mean size of the discrete combinations 
is equal to fi. Note that in this case once again the denominator of eq. (1) is equal 
to n"ft + N as in our earlier model (i.e., where the size is a constant (i). Hence, the 
probability of attachment is largely the same and k remains unchanged. 
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3. Analysis of the degree distribution of the one-mode projection of a- 
BiN. Let us assume that the sizes of the discrete combinations are sampled from 
a distribution fd and then q becomes sensitive to this distribution although k docs 
not. Let us call the probability that the node u having degree k in a-BiN ends up 
as a node having degree q in the one-mode projection F k (q). Further, let us denote 
the degree distribution of the alphabet nodes in the one-mode projection by p u {q)- 
If we assume that the degrees of the k nodes in V to which u is connected to are 
di, d,2, ■ ■ . , dk then we can write 

q= ( 3 ) 
i=l...k 

The probability that the node u in a-BiN is connected to a node in V of degree: d\ 
is rfi/di, <^2 is difd 2 -, ■ ■ ■ j dk is d k fd k - One might apply the generating function (GF) 
formalism introduced in [8] to calculate the degree distribution of the alphabet nodes 
in the one-mode projection as follows. Let f(x) denote the GF for the distribution of 
the sizes of the discrete combinations. In other words, f(x) = fdX d ■ Similarly, 
let p(x) denote the GF for the degree distribution of the alphabet nodes in a- 
BiN, i.e., p(x) = J2 k PkX k - Further, let g{x) denote the GF for p u (q). Therefore, 
g(x) = J2 q Pu{q)x q ■ The authors in [8] (see eq. (70)) have shown that g(x) can be 
correctly expressed as 

g(x)=p(f(x)/fi) (4) 

If fd and p k are distributions for which a closed form is known for f{x) and p(x) 
then it is easy to derive a closed form solution for g(x) (e.g., if both fd and pk are 
Poisson-distributed). However, in our case, p k is /3-distributed as shown in cq. (2) 
and there is no known closed form expression for p(x). Therefore, it is difficult to 
carry out our analysis any further using the GF formalism. Note that, in general, 
there can be many instances of bipartite networks (in addition to a-BiNs) where 
the closed form expression for the GFs of the distributions fd and pk are unknown. 

An alternative way to approach the problem would be to calculate a generic 
expression for p u {q) from the first principles. We shall therefore attempt to obtain 
such an expression, propose a suitable approximation for it and then check for 
its dependence on the choice of fd- As we shall see, in many cases it is even 
possible to obtain closed form solution for the expression oip u {q). The appropriately 
normalized probability that the node u in a-BiN is connected to nodes of degree d\ , 

efo, • • • j dk in V is ^ rfl ^ dl ^ ( d2 ^ 2 ) • • • \~^Tj (each such connection is independent 
of the others) . Under the constraints d\ + c?2 + ■ ■ ■ + dk = q, we have 

Fk(q) = dld2 ' k ' dk Idjd 2 ...fd k (5) 

di+d-2^ hd k =q ^ 

We can now add up these probabilities for all values of k weighted by the probability 
of finding a node of degree k in a-BiN. Thus we have, 

Pu{q) =Y,PkF k {q) (6) 

k 

or, 

Pu(q) = J^Pk 1 2 "fc" k fdJd 2 ---fd k (7) 

fc di+d 2 H \-d k =q " 

For the rest of the analysis, we shall assume that d\d2 ... d k is approximately 
equal to /j, k . In other words, we assume that the arithmetic mean of the distribution 
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is close to the geometric mean, which holds when the variance of the distribution is 
low. We shall shortly discuss in further details the bounds of this approximation. 
However, prior to that, let us investigate, how this approximation helps in advancing 
our analysis. Under the assumption rfld2 i-" dfc = 1, F/~(q) can be thought of as the 
distribution of the sum of k random variables each sampled from fd- In other 
words, Fk(q) tells us how the sum of the k random variables is distributed if each 
of these individual random variables are drawn from the distribution /<j. This 
distribution of the sum can be obtained by the iterative convolution of fd for k 
times 1 (see [5] for details). If the closed form expression for the convolution exists 
for a distribution, then we can obtain an analytical expression for p u {q). In the 
following, we shall attempt to find an expression for p u (q) assuming four different 
forms of the distribution fd- As we shall see, F k (q) is different for each of these 
forms, thereby, making the degree distribution of the nodes in Gjj sensitive to the 
choice of fd- Since in the expression for q (eq. (3)) we need to subtract one from 
each of the di terms therefore the distribution Fk (q) has to be shifted accordingly. 
We shall denote this approximate and shifted version of Fk(q) by Fk(q). 

3.1. Effect of the sampling distribution fd- In this section, we shall analyt- 
ically study the effect of the sampling distribution fd on the degree distribution 
of the one-mode projection of a-BiN. Note that while we use continuous functions 
for the theoretical analysis the simulations arc carried on with their discrete coun- 
terparts. In other words, we use the probability mass functions rather than the 
probability density functions for our simulations. 
Delta function: Let fd be a delta function of the form 

= { I fthcrw'L (8) 
Note that this boils down to the case where the size of each discrete combination is 



l d 2 . . . 

convoluted k times then the sum should be distributed as 



a constant [i and therefore, 1 2 fc " k is exactly equal to 1. If this delta function is 



A( ? )=f( ? ,*M -*)={; n~ w t~ k ® 

Therefore, p u {q) exists only when q = k(fi — 1) oi k = q/ (/i — 1) and we have (this 
result has also been obtained through a different approach in [3]) 

v (a) - I Pk [ik = ~ X) (10) 
Pu[q) \ otherwise (W) 

Normal distribution: If fd is a normal distribution of the form N(fi, a 2 ) then the sum 
of k random variables sampled from fd is again distributed as a normal distribution 
of the form N(kfi, ka 2 ). Therefore, Fk(q) is given by 

F k (q)=N{kfl~k,kcr 2 ) (11) 

If we substitute the density function for N we have 

aw- * e.pf- fa-^: 1 " 2 ) ( 12 ) 



^^Note that apart from a few special cases it is hard to convolve dfd (instead of fd) for k times 
and hence, we chose to work with the approximate version of F^{q). 
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Hence, p u {q) is given by 

(\ 1 X- 1.-0.5 ( {q-k(^-\)) 2 \ 

*®=-^\* k ex H — 2^ — J (13) 

Exponential distribution: If fd is an exponential distribution of the form E(X) where 
A = 1/fi then the sum of the k random variables sampled from fd is known to take 
the form of a gamma distribution T(q; k, /i). Therefore, we have 

F k (q)=T(q;k,»-l) (14) 

Substituting the density function we have 

P (n] _ A' exp(-A'g)(A'g) fc - 1 

Fk{q) ~ {k~l)l (15) 

where A = l/(/i — 1). Hence, p u (q) is given by 

cxp(-X'q)(X'q) k ~ 1 
Pu{q) = X^p k Ik—Jfl ( 16 ) 

Power-law distribution: There is no known exact solution for the sum of k ran- 
dom variables each of which is sampled from fd that is power-law distributed with 
exponent A^. However, as noted in [15, 16], asymptotically the tail of the distribu- 
tion obtained from the convolution 2 is dominated by the smallest exponent that is 
minimum(Xi, A2, . . . , Afc). Note that due to this approximation the resultant degree 
distribution should indicate a better match with the stochastic simulations towards 
the tail. We have 

F k (q) ~ kq- mimmum(XlM >-' Xk) (17) 
However, since we are sampling from the same distribution each time so Ai = A2 = 
• • • = Afc = A and 

F k {q) ~ kq- X (18) 
Consequently, p u (q) can be expressed as 

Pu(q) = Y.P kk( l~ X ( 19 ) 

fc 

Figure 1(a) shows the cumulative degree distribution (i.e., the probability P x that 
a node has degree > x) of the alphabet nodes in a-BiN assuming that the degrees 
of the nodes in V are sampled from (i) normal, (ii) delta, (iii) exponential and 
(iv) power-law distributions each having the same mean (/z = 22). Note that the 
standard deviation (a) of the normal distribution is controlled in such a way that 
the value of the random variable d is never negative. The results are obtained 
by averaging 100 stochastic simulations of the model. Since k is not affected by 
the choice of this distribution therefore, P k remains same as long as the means 
of these distributions are same. Figure 1(b) shows the degree distributions of the 
one-mode projections corresponding to the a-BiNs generated for Figure 1(a). The 
result clearly implies that the degree distribution of the one-mode projection varies 
depending on how the degrees of the nodes in V are distributed although the degree 
distribution remains unaffected for all the a-BiNs generated. Figure l(c)-(f) shows 
the match of the analytical expressions (with appropriate normalization) derived for 
normal (eq. (13)), delta (eq. (10)), exponential (eq. (16)) and power-law (eq. (19)) 



2 Note that if fd is power-law distributed, one can actually compute the convolution of dfd also 
which is again given by a power-law with exponent minimum[(\\ — 1), (A2 — 1), ... , (A^ — 1)]. 




x (degree) 



Figure 1. Degree distribution of a-BiNs and their correspond- 
ing one-mode projections in doubly-logarithmic scale. In all cases, 
N = 1000, t = 1000 and 7 = 2. For stochastic simulations, the 
results are averaged over 100 runs. All the results are appropri- 
ately normalized, (a) Degree distributions of alphabet nodes of 
a-BiNs generated through stochastic simulations when the size of 
a discrete combination is assumed to be sampled from a (i) normal 
distribution (/i = 22, a = 13), (ii) delta function (fj, = 22), (hi) 
exponential distribution (/i = j = 22) and (iv) power-law distribu- 
tion (A = 1.16 and the mean \i = 22); (b) the degree distributions of 
the one-mode projections of the a-BiNs in (a); (c) match between 
stochastic simulations (blue dots) and eq. (13) (pink dots) where 
/i = 22 and a = 13; the green dots indicate the case where the sizes 
of the discrete combinations are fixed to a constant /i = 22; the 
brown dots show how the result deteriorates when a is 100 times 
larger; (d) match between stochastic simulations (blue dots) and 
eq. (10) (pink dots) where /j, = 22; (e) match between stochastic 
simulations (blue dots) and eq. (16) (pink dots) where /i = j = 22; 
the yellow dots show the plot for eq. (22); the green dots indicate 
the case where the sizes of the discrete combinations are fixed to a 
constant fi = 22 (these dots are given as a reference to show that 
even the approximate eq. (22) produces better results); (f) match 
between stochastic simulations (blue dots) and eq. (19) (pink dots) 
where 7 = 1.16 and ^ = 22; the yellow dots show the plot for 
eq. (23); the green dots indicating the case where the sizes of the 
discrete combinations are fixed to a constant fi = 22 are again pro- 
vided as a reference to demonstrate that it is much worse than even 
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with the respective stochastic simulations. The figures show that the analytically 
obtained expressions are in good agreement with the simulations. Note that in case 
of power-law. while the heavy tail matches perfectly, the low degree zone deviates 
slightly which is a direct consequence of the approximation used in the convolution 
theory for power-law. 

Finally, it remains to be mentioned that in many cases it is possible to derive 
a closed form expression for p u (q)- One can think of PkFk(q) as a function F in q 
and fc, i.e., PkFk{q) = F(q, fc). If F(q, k) can be exactly (or approximately) factored 
into a form like F(q)F(k) then p u (q) becomes 

Pu (q) = F(q)J2F{k) (20) 

fc 

Changing the sum in eq. (20) to its continuous form we have 

Pu (q) = F(q) f F(k)dk = AF(q) (21) 
Jo 

where A is a constant. In other words, the nature of the resulting distribution is 
dominated by the function F(q). For instance, in case of exponentially distributed 
fd, with some algebraic manipulations and certain approximations one can show 
that (see the yellow dots in Figure 1(e)) 

p u (q) n Aexp (-^) (22) 

Similarly, in case of power-law one can show that (see the yellow dots in Figure 1(f)) 

p u (q) « Aq- X (23) 

Therefore, it turns out that when this factorization is possible, the resulting degree 
distribution of the one-mode projection is largely dominated by that part of the 
convolution which is only dependent on q. 

3.2. Approximation Bounds. We shall employ the GF formalism to find the 
necessary condition (in the asymptotic limits) for our approximation to hold. More 
precisely, we shall attempt to estimate the difference in the means (or the first mo- 
ments) of the exact and the approximate expressions for p u {q) and discuss when 
this difference is negligible which in turn serves as a necessary condition for the 
approximation to be valid. We shall denote the generating function for the approx- 
imate expression of p u {q) as g a pp{x). In this case, the GF encoding the probability 
that the alphabet node u is connected to a node in V of degree d is simply f(x)/x 
and consequently, Fk(q) is given by (f(x)/x) k . Therefore, 

gapp(x) =^Pk 
fc 

Now we can calculate the first moments for the approximate and the exact p u {q) 
by evaluating the derivatives of g a pp(x) and g(x) respectively at x = 1. We have 

9app0) = ^P(/(*)/aOL=i = (WMt* - 1) (25) 

Similarly, 

g'(l) = ±p(f(x)/ti)\ x=1 = (t/NMv 1) + (t/N)a 2 (26) 



fix) 



p{f{x)/x) 



(24) 
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Table 1. Examples of real- world networks. 



n-DiX 



One-mode 



Data Source 



U 



PlaNet 
VlaNet 
ProComp 
StaTNet 



PhoNet 
VoNet 
ProtNet 
StaNet 



http://yea.st- complexes, embl. de 
http://www. rnd innrail.gov. in 



UPSID 
UPSID 



541 
151 
959 
2764 



317 
317 
373 
1377 




Thus, the mean of the approximate p u (q) is smaller than the actual mean by 
{t/N)a 2 . Clearly, for a = 0, the approximation gives us the exact solution, which 
is indeed the case for delta functions. Also, in the asymptotic limits, if a 1 <C N 
(with a scaling of l/t), the approximation holds good. However, as the value of a 
increases the results start deteriorating (see the brown dots in Figure 1(c)). 

4. Experiments with real-world a-BiNs. In this section, we shall present case 
studies of four different real-world a-BiNs to corroborate our findings in the earlier 
section. While the first two are instances from natural language, the third and the 
fourth arc instances from biology and society respectively. In each case, the results 
obtained assuming the actual distribution of the size of the discrete combinations 
outperform those obtained assuming the size to be a constant (always equal to the 
average size of the discrete combinations). For all the stochastic simulations of the 
model, 7f, es t indicates the value of the parameter 7 where the degree distribution 
of the alphabet nodes in the bipartite network generated indicates the best match 
with the corresponding real bipartite network (i.e, when the mean difference in the 
values of Pk for similar k is least between the two degree distributions). The four 
networks and the associated results are as follows. 

Phoneme-phoneme Network (PhoNet). PlaNet or the Phoneme-language Net- 
work is an a-BiN where the two partitions respectively stand for the consonants 
(fixed alphabet) and the language inventories (growing discrete combinations) where 
an individual inventory is defined as a repertoire of distinct consonants used by 
the speakers of that language for communication. PhoNet is the corresponding 
one-mode projection of PlaNet onto the consonant nodes (see Table 1 for details). 
Figure 2 clearly shows that the degree distribution obtained from the stochastic 
simulations (for "/best) of the model assuming the actual distribution of the conso- 
nant inventory sizes (blue symbols) matches significantly better with that of the 
empirical PhoNet (red symbols) than the case where the sizes are assumed to be 
equal to a constant (green symbols). 

Vowel- vowel Network (VoNet). VlaNet or Vowel-language Network is similar 
to PlaNet with the exception that the two partitions here respectively stand for 
the universal set of vowels (fixed alphabet) and the language inventories (growing 
discrete combinations) 3 . VoNet (like PhoNet) is the one-mode projection of VlaNet 
onto the vowel nodes (see Table 1 for details). Once again we observe (see Figure 2) 
that the degree distribution of the empirical VoNet (red symbols) indicates a closer 
match with the case where the stochastic simulation (for "/best) assumes the actual 
vowel inventory size distribution (blue symbols) than the case where its assumes 
the size to be a constant (green symbols). 



3 We have modeled the consonant and the vowel inventories separately because, the structure of 
these inventories show distinct differences [9] although there are certain similarities among them. 
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Figure 2. Degree distribution of PhoNct, VoNet, ProtNet and 
StaNet along with the fits obtained from the synthesis model. The 
x-axis is in the logarithmic scale. Red symbols indicate the degree 
distribution from real data, green dots refer to the degree distribu- 
tion obtained where the nodes in V arc assumed to be a constant 
and blue dots refer to the degree distribution obtained if the ac- 
tual distribution of the sizes of the phoneme inventories is assumed. 
The simulations are averaged over 100 runs. 

Protein-protein Network (ProtNet). ProComp or Protein-Complex Network 
is an a-BiN where the nodes in U correspond to distinct proteins and those in V 
correspond to unique protein complexes. ProtNet is the corresponding one-mode 
projection of ProComp onto the protein nodes (see Table 1 for details). The degree 
distribution (see Figure 2) of empirical ProtNet (red symbols) matches significantly 
well with the simulation results (for jtest ) if the actual complex size distribution is 
taken into account (blue symbols) compared to the case where this size is assumed 
to be a constant (green symbols). 

Station-station Network (StaNet). StaTNct or the Station- Train network is 
an a-BiN in which the nodes in the partition U represents the stations 4 while those 
in V represents the trains and there is an edge in this network iff a train in its route 
halts at a station. StaNet is the one-mode projection of StaTNet onto the station 
nodes (see Table 1 for details). Once again, the degree distribution (see Figure 2) 
of empirical StaNet (red symbols) shows a closer match with the case where the 
stochastic simulations (for "/best) are performed assuming the distribution of train 
frequency (blue symbols) than the case where this frequency is assumed to be equal 
to the average frequency and hence a constant (green symbols). 



4 For a fully developed railway transport system, the number of stations are almost fixed or 
grow at an extremely slow rate. 
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5. Conclusion. In this paper, we have analyzed the degree distribution of the 
one-mode projection of an a-BiN used for modeling discrete combinatorial sys- 
tems. More specifically, wc identified that the degree distribution of this network 
varies if the sizes of the discrete combinations are assumed to be random variables 
rather than a constant. Further, wc derived approximate analytical expressions 
(and closed form solutions in certain cases) for the degree distributions assuming 
different distributions from where these sizes are sampled. The analytical results 
are in good agreement with the stochastic simulations. We also discussed about 
the bounds of the approximation used for deriving the analytical expression for the 
degree distribution. Finally, we presented four case studies to further support our 
finding. 

It is worthwhile to mention here that the theoretical framework that we developed 
for analyzing the degree distribution of the one-mode projection of an a-BiN can be 
applied to any other type of bipartite network in general. For instance, in case of 
the model proposed in [14] for movie-actor networks, the distribution of the cast size 
in the bipartite network is assumed to be a constant while the degree distribution 
Pk of the actor nodes is found to follow a power-law. Therefore, it is easy to see that 
this particular case shall follow an analysis very similar to that outlined in eqs. (8), 
(9) and (10). 
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